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ABSTRACT 


A model  for  learning  systems  is  presented,  and  representative  Al , pattern 
recognition,  and  control  systems  are  discussed  in  terms  of  its  framework. 

The  model  details  the  functional  components  felt  to  be  essential  for  any 
learning  system,  independent  of  the  techniques  used  for  its  construction, 
and  the  specific  environment  in  which  it  operates.  These  components  are 
performance  element,  instance  selector,  critic,  learning  element,  blackboard, 
and  world  model.  Consideration  of  learning  system  design  leads  naturally 
to  the  concept  of  a layered  system,  each  layer  operating  at  a different 
level  of  abstraction. 
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Abstract 

A model  for  learning  systems  is  presented,  and  representative  AI, 
pattern  recognition,  and  control  systems  are  discussed  in  terms  of  its 
framework.  The  model  details  the  functional  components  felt  to  be 
essential  for  any  learning  system,  independent  of  the  techniques  used 
for  its  construction,  and  the  specific  environment  in  which  it  operates. 
These  components  are  performance  element,  instance  selector,  critic, 
learning  element,  blackboard,  and  world  model.  Consideration  of 
learning  system  design  leads  naturally  to  the  concept  of  a layered 
system,  each  layer  operating  at  a different  level  of  abstraction. 

Descriptive  Terms:  adaptation,  learning,  concept-formation, 

induction,  performance  element,  instance  selector,  critic,  learning 
element,  blackboard,  world  model,  multi-layered  systems. 


1 Introduction 

Adaptation,  learning,  concept-formation,  induction,  self- 
organization, and  self-repair  have  been  of  Interest  to  researchers  in  a 
number  of  fields  for  many  years.  Each  discipline  has  brought  a 
different  perspective  to  the  research,  and  the  result  has  been  a great 
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variety  of  learning  system  (LS)  models  and  descriptive  terminology.  We 
have  therefore  synthesized  a new  model  for  unified  characterization  of 
systems  constructed  from  these  different  perspectives.  The  model  is 
also  useful  as  a paradigm  for  new  learning  systems,  because  it  enables 
the  designer  to  isolate  the  functional  components,  and  the  information 
that  must  be  available  to  them. 

2 Two  Approaches  to  Learning 

In  this  section,  we  summarize  two  different  approaches  to  the 
construction  of  systems  that  can  be  said  to  learn.  The  first  approach 
centers  on  the  concept  of  an  adaptive  system  and  is  primarily  associated 
with  research  in  pattern  recognition  and  control  theory;  the  second  is 
that  of  artificial  intelligence  (AI). 

2. 1 The  Adaptive  System  Approach 

In  the  control  literature,  learning  is  generally  assumed  to  be 
synonymous  with  adaptation,  and  is  often  viewed  as  estimation  or 
successive  approximation  of  the  unknown  parameters  of  a structure  which 
is  chosen  by  the  LS  designer  to  represent  the  system  under  study  [8] 
[12].  Once  this  has  been  done,  control  techniques  known  to  be  suitable 
for  the  particular  chosen  structure  can  be  applied.  Thus  the  emphasis 
has  been  on  parameter  learning,  and  the  achievement  of  stable,  reliable 
performance  [30].  Problems  are  commonly  formulated  in  stochastic  terms, 
and  the  use  of  statistical  procedures  to  achieve  optimal  performance 
with  respect  to  some  performance  criterion  such  as  the  probability  of 
correct  pattern  classification,  or  mean  square  error,  is  standard  [39]. 

There  are  many  overlapping  and  sometimes  contradictory  definitions 
of  the  terms  "adaptive  system",  "learning  system",  "self-repairing 
system",  and  "self-organizing  system".  The  following  set,  formulated  by 
Glorioso  [131  serves  to  illustrate  the  main  features.  An  adaptive 
system  is  defined  as  a system  which  responds  acceptably  with  respect  to 
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some  performance  criterion  in  the  face  of  changes  in  the  environment  or 
Its  own  internal  structure.  A learning  system  is  a system  that  responds 
acceptably  within  some  time  interval  following  a change  in  its 
environment,  and  a self-repairing  system  is  one  that  responds  acceptably 
within  some  time  interval  following  a change  in  its  internal  structure^. 
Finally,  a self-organizing  system  is  an  adaptive  or  learning  system  in 
which  the  initial  state  is  unknown,  random,  or  unimportant. 

Other  terms  often  used  to  describe  learning  systems  in  the  pattern 
recognition  and  control  literature  are  "learning  with  teacher"  or 
"supervised  learning"  and  "learning  without  teacher"  or  "unsupervised 
learning"  [12]  [7].  Learning  with  ter.cher  assumes  the  existence  of  an 
external  entity  (usually  a human)  which  presents  the  system  with  a 
training  set  of  instances,  evaluates  the  performance  of  the  system  for 
those  instances,  and  provides  the  correct  responses.  Learning  without 
teacher  assumes  that  the  environment  provides  all  instances,  but  does 
not  provide  the  correct  responses.  Performance  is  to  be  evaluated  by  the 
system  itself.  Tsypkin  [33]  has  pointed  out  that  unsupervised  learning 
is  somewhat  of  an  illusion  in  the  sense  that  a teacher/designer  defines 
the  structure  which  determines  the  quality  of  operation  of  the  LS  at  the 
outset,  whether  or  not  he  is  present  during  the  actual  operation  of  the 
system . 

2.2  The  Artificial  Intelligence  Approach 

Although  early  AI  research  was  closely  tied  to  pattern 
recognition,  and  the  techniques  commonly  associated  with  the  adaptive 
systems  approach,  (see,  for  example  [28]  and  [3**])i  the  two  fields 
diverged  in  the  I960's,  and  are  now  quite  distinct.  Whereas  the  pattern 
recognition  and  control  research  emphasizes  adjustment  of  parameters,  AI 
re<5earch  emphasizes  construction  of  symbolic  structures,  based  on 
conceptual  relations.  For  example,  Felgenbaum's  EPAM  program  [9]  used  a 

^ Learning  and  self-repairing  systems  have  been  considered  as 
.'specialized  adaptive  systems  [30]. 
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discrimination  net  (i.e.,  a tree  of  tests  and  branches)  to  store  the 
relations  required  to  recall  nonsense  syllables  in  a rote  learning 
experiment  (see  [31],  [11],  and  [37]  for  further  examples). 

The  kind  of  learning  that  involves  only  estimation  of  unknown 
parameters  (i.e.,  the  parameter  learning  of  Section  2.1  ) has  been 
referred  to  as  terminal  learning  in  the  A1  literature  [22].  In  AI,  it 
is  commonly  believed  that  a learning  system  should  have  sufficient 
internal  structure  to  develop  a "strong  theory"  of  its  environment  [10] 
['9].  Much  emphasis  has  therefore  been  placed  on  building  "knowledge- 
based"  or  "expert"  systems  that  not  only  have  the  capacity  for  high 
performance,  but  can  also  explain  their  performance  in  symbolic  terms 
[6].  Concept-formation  systems  in  particular  stress  the  construction  of 
symbolic  descriptions  [19]. 

Winston  [38]  describes  various  levels  of  sophistication  In 
learning  systems:  learning  by  being  programmed,  learning  by  being  told, 
learning  from  a series  of  examples,  and  finally  learning  by  discovery. 
We  see  in  this  categorization  a gradual  shift  in  responsibility  from  the 
designer/teacher  to  the  learning  system/student.  At  the  highest  level, 
the  system  is  able  to  find  its  own  examples,  and  carry  on  autonomously. 


3 The  LS  Model 

We  are  concerned  with  the  functional  description  of  LS's  and  their 
interaction  with  the  environments  in  which  they  operate.  Many  of  the 
functional  components  of  an  LS  are  essential  to  intelligent  systems  in 
general,  as  noted  also  by  Simon  and  Lea  [291. 

3. 1 Environment 

The  environment  in  which  an  LS  operates  may  have  a profound  effect 
upon  its  design,  and  therefore  it  is  of  interest  to  consider  a few  major 
environment  classes.  LS  environments  can  be  divided  into  two  major 
categories:  those  that  provide  the  correct  response  for  each  training 
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'.nstance  (supervised  learning)  and  those  that  do  not  (unsupervised 
learning).  Supervised  learning  systems  operate  within  a stimulus- 
response  environment  in  which  the  desired  output  of  the  LS  is  available 
along  with  each  training  instance  [12]  [7].  Samuel's  "book  move" 
checkers  program  [26][27],  and  grammatical  inference  programs  [15] 
(i.e.,  programs  that  attempt  to  infer  the  rules  of  a grammar  from  sample 
sentences  generated  by  that  grammar)  are  further  examples  of  LS's 
operating  within  such  an  environment.  This  is  also  the  nature  of  the 
environment  for  automatic  programming  systems  [m]  which  construct 
programs  to  reproduce  (or  explain)  a set  of  sample  input/output  pairs. 

Unsupervised  LS's  operate  within  an  environment  of  instances  for 
which  the  correct  response  is  never  available.  The  version  of  Samuel's 
program  which  learns  by  playing  checkers  against  an  opponent  falls  into 
this  category  [26].  Learning  systems  operating  within  this  type  of 
environment  must  themselves  infer  the  correct  response  to  each  training 
instance  by  observation  of  system  performance  for  a series  of  instances. 
As  a result,  assignment  of  credit  or  blame  for  overall  performance  to 
individual  responses  is  a problem  for  these  systems  [21]. 

Environments  can  be  further  categorized  as  "noise-free"  or 
"noisy".  Noise-free  environments,  such  as  that  of  Winston's  structural 
description  learning  program  [37]  provide  instance/correct-response 
pairs  in  which  the  data  are  assumed  to  be  perfectly  reliable.  Noisy 
environments,  on  the  other  hand,  do  not  provide  such  perfect 
information,  as  is  usually  the  case  when  real  data  are  involved  (pattern 
recognition  and  control  systems  frequently  operate  within  noisy 
environments  [7]  [8]  [1]).  Environments  which  react  to  the  LS 
resDonses  in  some  way  that  is  not  under  the  control  of  the  system  can 
also  be  considered  to  fall  into  this  category.  The  opponent  in  a game 
for  example,  operates  on  the  response  of  the  LS  to  provide  the  next 
Instance  [35]. 
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3.2  The  Model  - Overview 

The  LS  model  is  shown  in  Figure  1.  The  PERFORMANCE  ELEMENT  is 
responsible  for  generating  an  output  in  response  to  a t'-aining  instance. 
The  INSTANCE  SELECTOR  selects  suitable  training  instances  from  the 
environment.  The  CRITIC  analyzes  the  output  of  the  performance  element 
in  terms  of  some  standard  of  performance.  The  LEARNING  ELEMENT  makes 
specific  changes  to  the  system  in  response  to  the  analysis  of  the 
critic.  Communication  among  the  functional  components  is  shown  via  a 
BLACKBOARD  to  ensure  that  each  functional  component  has  access  to  all 
required  system  information.  Finally,  the  LS  operates  within  a WORLD 
MODEL  containing  general  assumptions  and  methods  defining  the  domain  of 
activity  of  the  system. 

Existing  systems  can  seldom  be  partitioned  unambiguously  into  the 
functional  components  shown  in  Figure  1.  These  components  are  conceptual 
entities  which  simplify  the  characterization  of  existing  systems,  and 
will  assist  designers  in  the  construction  of  new  systems.  They 
correspond  to  functions  that  must  be  performed  to  effect  learning.  In 
many  existing  systems,  one  or  more  of  the  functions  are  fulfilled  by  a 
human  who  is  considered  to  be  part  of  the  LS. 

In  the  following  sections,  we  present  detailed  discussions  of  the 
LS  model  components  shown  in  Figure  1.  In  addition.  Appendix  I 
contains  detailed  characterizations  of  representative  AI,  pattern 
recognition,  and  control  systems  in  terras  of  the  model.  The  reader  may 
find  it  helpful  to  refer  occasionally  to  this  appendix  while  reading  the 
following  sections. 
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Figure  1.  The  Components  of  a Learning  System 


3.3  Performance  Element 

The  performance  element  is  the  mechanism  that  uses  the  learned 
information  to  perform  the  stated  task.  It  has  been  included  in  the  LS 
model  because  of  the  intimate  relationship  between  what  is  to  be  learned 
by  an  LS  and  the  way  in  which  the  learned  information  is  to  be  used. 

Performance  elements  are  usually  tailored  more  to  the  requirements 
of  the  task  domain  than  to  the  architecture  of  the  LS.  In  general,  the 
performance  element  can  be  run  in  a stand-alone  mode  without  learning, 
Independent  from  the  rest  of  the  LS  (e.g.,  Samuel's  checker  playing 
program  [26]  [27]).  In  any  LS,  however,  the  ability  to  improve 
performance  presupposes  a method  of  communicating  learned  information  to 
the  performance  element.  Therefore,  the  architecture  of  the  performance 
element  must  allow  learned  information  to  affect  its  decisions,  and  so 
additional  constraints  are  placed  on  any  performance  element  that  is  to 
be  used  as  a component  of  an  LS.  The  performance  element  must  be 
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constructed  so  that  information  about  its  internal  machinations  is 
readily  available  to  the  other  system  components.  This  information  can 
be  used  to  make  possible  detailed  criticism  of  performance,  and 
intelligent  selection  of  further  instances  to  be  examined  by  the  system. 

The  performance  elements  of  existing  systems  vary  in  the  number  of 
ways  in  which  they  may  be  altered  by  learning.  Systems  which  adjust 
parameter.s  as  their  sole  learning  method  are  relatively  limited  in  the 
performance  variations  they  can  exhibit  [17]  [20],  whereas  systems  whose 
operation  is  determined  by  a set  of  production  rules  can  exhibit  greater 
variations  [7S]  [36]. 

1.4  Instance  Selector 

The  instance  selector  is  a mechanism  that  selects  training 
instances  from  M.e  environment  that  are  to  be  used  by  the  system  in 
learning.  It  is  a functional  component  not  clearly  isolated  in  earlier 
adaptive  system  models  [12]  [30]  [13]. 

In  reviewing  existing  LS's  we  have  found  that  methods  for  instance 
selection  vary  mainly  along  the  dimensions  of  responsibility  and 
sophistication.  The  responsibility  for  instance  selection  varies 
between  the  extremes  of  completely  external  ("passive")  selection,  and 
completely  internal  ("active")  selection.  Instance  selection  in 
Samuel's  book  move  checkers  program  [26]  [27]  is  externally  controlled, 
whereas  Popplestone 's  program  [2*1],  which  learns  the  features  that 
characterize  a winning  position  in  tic-tac-toe,  generates  its  own 
training  Instances.  It  forms  alternate  hypotheses,  and  then  generates 
instances  to  choose  among  them  (relying  upon  an  external  critic  to 
evaluate  these  instances).  In  the  adaptive  systems  literature,  Tse  and 
Bar-Shalom  [32]  discuss  the  use  of  "dual-control"  in  an  attempt  to 
identify  the  parameters  of  a system  at  the  same  time  as  it  is  being 
controlled. 

The  degree  of  sophistication  used  for  LS  instance  selection  is 
also  an  important  consideration.  In  order  to  qualify  as  sophisticated, 
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an  instance  selector  must  be  sensitive  to  the  current  abilities  and 
deficiencies  of  the  performance  element  and  must  construct  or  select 
instances  which  are  designed  to  improve  performance.  Winston  [37]  has 
shown  the  advantages  to  be  accrued  through  presenting  carefully 
constructed  examples  and  "near-misses"  of  the  concepts  to  be  acquired  by 
an  LS.  In  general,  careful  instance  selection  can  improve  the 
reliability  and  efficiency  of  an  LS.  We  must  note,  however,  that  this 
may  not  always  be  permitted  by  the  environment  in  which  the  LS  operates, 
as  is  generally  the  case  for  adaptive  control  systems  [8]. 

3.5  Critic 

The  critic  may  play  three  roles;  EVALUATOR,  DIAGNOSTICIAN,  and 
THERAPIST.  It  always  operates  as  an  evaluator,  in  that  it  embodies  a 
standard  by  which  to  assess  the  behaviour  of  the  performance  element. 
This  is  the  role  that  has  been  emphasized  in  earlier  adaptive  system 
models  [12]  [30]  [131. 

The  critic  may  also  operate  as  a diagnostician,  and  localize  the 
reasons  for  poor  performance.  This  type  of  behavior  is  essential  for 
resolution  of  the  credit  assignment  problem  described  by  Minsky  [21].  In 
its  role  as  diagnostician,  the  critic  is  exemplified  by  the  bug 
classifier  and  summarizer  in  Sussman's  HACKER  [31]. 

Finally  the  critic  may  be  able  to  operate  as  a therapist,  and  make 
specific  recommendations  for  improvement  or  suggestions  about  future 
instances.  In  Waterman's  poker  player  [35],  the  critic  as  therapist 
suggests  the  bet  that  should  have  been  made  by  the  performance  element 
fon  a particular  training  instance. 

Not  all  systems  exhibit  sufficiently  complex  behavior  to  warrant 
critics  that  fulfill  all  three  functions.  The  critic  as  therapist  in 
particular  is  not  often  seen  in  simple  systems. 

The  dividing  line  between  critic  and  learning  element  is  difficult 
to  distinguish,  and  it  is  certainly  possible  to  view  therapy  as  a 
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In  some  LS's  the  functions  of  the  critic  have  been  left  to  the 
human  who  uses  the  system.  For  example,  MYCIN/TEIRESIAS  [6]  uses  a 
human  critic,  acting  as  evaluator,  diagnostician,  and  therapist  to 
suggest  alterations  to  its  rule  base. 

3.6  Learning  Element 

The  learning  element  is  an  interface  between  the  critic  and  the 
performance  element,  responsible  for  translating  the  abstract 
recommendations  of  the  critic  into  specific  changes  in  the  rules  or 
parameters  used  by  the  performance  element. 

Representations  for  learned  information  exhibit  great  variety. 
They  include,  for  example  production  rules  [35],  parameterized 
polynomials  [26],  executable  procedures  [31],  signature  tables  [27], 
stored  facts  [9],  and  graphs  [37].  The  method  of  incorporating  new 
learned  information  is  dependent  upon  this  representation,  and  even 
among  systems  which  use  similar  representations,  competing  methods  are 
found  (contrast,  for  example,  [3]  and  [35]). 

The  extent  to  which  the  learned  information  is  altered  in  response 
to  each  training  instance  is  an  important  LS  design  consideration.  In 
some  systems  [37],  the  learning  element  incorporates  exactly  the 
information  supplied  by  the  critic.  Were  the  same  training  instance  to 
occur  later,  the  response  of  the  performance  element  would  be  exactly  as 
the  critic  advised  for  the  first  occurrence.  This  type  of  learning  is 
well  suited  to  environments  which  provide  perfect  data  and  to  systems 
with  reliable  critics.  Under  these  conditions  the  LS  will  converge 
rapidly  toward  the  desired  behavior.  If  such  a system  were  provided 
with  an  incorrect  classification  by  the  environment  or  less  than 
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reliable  advice  by  the  critic,  however,  it  might  commit  itself  to 
incorrect  assumptions  from  which  it  is  difficult  to  recover.  Systems 
which  make  less  drastic  changes  to  the  learned  knowledge  on  the  basis  of 
a single  training  instance  are  less  vulnerable  to  imperfect  Information, 
but  consequently  require  more  training  instances  to  converge  to  the 
desired  behavior.  Many  statistical  LS's  fall  into  this  category  [231. 
Other  systems  consider  several  training  Instances  at  a time  in  order  to 
minimize  the  effect  of  occasional  noisy  instances  [3]- 

3.7  Blackboard 

The  blackboard  of  our  model  is  an  extension  of  the  concept 
introduced  in  the  HEARSAY  system  [18],  functioning  as  both  global  data 
base  and  communications  mechanism.  The  blackboard  holds  two  types  of 
information:  the  information  usually  associated  with  the  "knowledge 
base"  in  AI  programs,  and  the  temporary  information  used  by  the  LS 
components.  The  knowledge  base  often  contains  the  set  of  rules, 
parameter  values,  symbolic  structures,  and  so  on,  currently  being  used 
by  the  performance  element.  Such  Information  can  be  used  as  an  aid  to 
sophisticated  instance  selection  if  it  is  readily  available.  The 
temporary,  system-oriented  information  includes,  for  example,  the 
internal  decisions  made  by  the  performance  element  in  selecting  a 
particular  response.  Detailed  criticism  by  the  critic  is  dependent  upon 
the  availability  of  this  information,  written  by  the  performance 
element . 

In  many  existing  systems  this  information  is  not  so  clearly 
separated  or  defined.  The  communication  links  between  functional 
comonnents,  especially,  are  often  programmed  directly.  Such  a non- 
modular  approach  is  known  to  lead  to  difficulties  when  redesign  is 
attempted  [2]. 


World  Model 

Whereas  the  blackboard  contains  Information  that  can  be  altered  by 


the  LS  components,  the  world  model  contains  the  fixed  conceptual 
framework  within  which  the  system  operates.  The  contents  of  the  world 
model  include  definitions  of  objects  and  relations  in  the  task  domain, 
the  syntax  and  semantics  of  the  information  to  be  learned,  and  the 
methods  to  be  used  by  the  LS.  Again,  there  are  no  clear  lines  of 
separation  between  the  world  model  and  the  other  parts  of  the  LS.  Our 
working  definition  is  that  the  world  model  contains  all  definitions, 
parameters,  vocabulary,  and  assumptions  that  are  available  for 
modification.  (Insofar  as  the  designer  can  change  any  piece  of  the  LS  we 
suggest  separating  those  that  are  easily  modified  from  the  rest.  See 
for  a philosophical  treatment  of  this  issue,  and  [5]  for  the 
discussion  that  led  to  our  including  a world  model.)  Among  task  domain 
definitions  are,  for  example,  the  rules  of  a game,  and  the 
representation  of  inputs  and  outputs  for  the  performance  element.  This 
part  of  the  world  model  simply  defihes  the  task  of  the  performance 
element,  and  the  standard  of  performance  (the  evaluation  function)  to  be 
applied  by  the  critic.  Definitions  of  the  syntax  and  semantics  of 
information  to  be  learned  define  the  mode  of  communication  between  the 
learning  and  performance  elements. 

The  world  model  may  include  several  additional  items.  Some 
systems  require  a model  for  translating  input  data  into  the  specific 
training  instances  to  be  used.  For  example,  the  controller  in  [17] 
preprocesses  inputs  in  a control  system,  and  the  first  part  of  Meta- 
DENDRAL  [3]  translates  each  input  molecule/data-point  pair  into 
plausible  molecule/process  pairs  under  a simple  theory  of  the  task 
domain.  Domain  specific  heuristics  are  also  commonly  added  to  the  world 
model  of  AI  systems  to  guide  inferences  made  by  the  LS  (e.g.,  the  blocks 
world  heuristics  of  Winston's  program  [3'^])- 

Although  the  world  model  cannot  be  altered  by  the  LS  that  uses  it, 
the  designer  can  alter  its  contents  in  order  to  improve  LS  performance. 
He  often  changes  parameters  and  procedures  of  the  basic  LS  after 
observing  and  criticizing  its  behavior  for  some  carefully  chosen 
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training  set.  These  alterations  result  in  a new  version  of  the  LS,  which 
is  then  tested  on  some  training  set,  and  so  on.  The  designer  views  the 
whole  LS  as  a system  whose  performance  needs  improvement,  and  he  selects 
instances,  criticizes  performance,  and  makes  changes  accordingly.  In 
other  words,  the  designer's  activities  can  be  modeled  by  a system  whose 
components  are  just  those  in  Figure  1.  This  leads  us  to  the  concept  of 
layered  LS's,  each  higher  layer  able  to  change  the  world  model 
(vocabulary,  assumptions,  etc.)  of  the  next  lower  layer  on  the  basis  of 
criticizing  its  performance  on  a chosen  set  of  instances.  Thus, 
adjustments  can  be  made  to  the  world  model  of  some  learning  system  LSI 
by  another  learning  system,  LS2,  which  has  its  own  functional  components 
(critic,  world  model,  etc.).  In  turn,  it  is  conceivable  that  a third 
system,  LS3,  could  adjust  the  world  model  of  LS2,  and  so  on.  The  final 
critic,  however,  is  the  designer,  operating  outside  of  the  "top-level" 
LS. 

One  existing  LS  which  may  be  viewed  as  a layered  system  is  the 
version  of  Samuel's  program  [27]  which  learns  a polynomial  evaluation 
function  for  selecting  checkers  moves  (see  Appendix  I for  details). 
The  lower  layer  (LSI)  in  this  system  adjusts  the  coefficients  of  a given 
set  of  game  board  features  in  order  to  improve  performance  of  the  move 
selection  program.  The  second  layer  system  (LS2)  adjusts  the  set  of 
board  features  used  in  the  evaluation  function  in  order  to  improve  the 
learning  performance  of  LSI.  Since  LSI  is  contained  in  LS2  as  the 
performance  element,  all  the  assumptions  necessary  for  its  operation 
also  belong  to  the  LS2  world  model.  In  addition,  the  LS2  world  model 
contains  assumptions  about  the  set  of  allowable  game  board  features  and 
the  standard  for  evaluating  LSI  performance. 

A single  layer  LS,  then,  can  never  move  outside  its  world  model  to 
make  radical  revisions  to  its  way  of  viewing  the  task  to  achieve  a 
"paradigm  shift",  as  discussed  by  Kuhn  [16].  However,  a shift  in  the 
conceptual  framework  of  LSI  could  be  made  by  a properly  programmed  LS2 
[3].  We  believe  that  a layered  approach  such  as  that  described  above 
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provides  a useful  system  organization  for  learning  at  various  levels  of 
abstraction  in  complex  domains.  Although  there  are  examples  of  this 
kind  of  layering  in  the  literature  [26]  [3^],  no  one  has  carried  it  as 
far  as  our  model  suggests,  and  it  appears  that  we  are  just  now  reaching 
the  point  of  understanding  single  layer  learning  systems  well  enough  to 
consider  developing  more  sophisticated  systems. 


4 Summary 

The  proposed  LS  model  provides  a common  vocabulary  for  describing 
different  types  of  learning  systems  which  operate  in  a variety  of  task 
domains.  It  encourages  classification  and  comparison  of  LS's  and  helps 
identify  unique  or  strong  features  of  individual  systems.  We  believe 
the  model  is  a useful  conceptual  guide  for  LS  design,  because  it 
isolates  the  essential  functional  components,  and  the  information  that 
must  be  available  to  these  components.  The  model  also  suggests  a 
layered  architecture  for  learning  at  different  levels  of  abstraction. 
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Appendix  I 

Characterization  of  Existing  Systems  in  Terms  of  the  Model 

In  this  appendix  several  existing  LS's  are  characterized  using  the 
framework  provided  by  the  model  described  in  Section  3.  The  systems 
selected  are  representative  of  several  approaches  to  machine  learning. 

Because  the  blackboard  contains  information  in  a state  of  flux,  its 
contents  are  not  specified  explicitly  for  the  systems  characterized 
below. 

Meta-DENDRAL,  Buchanan,  et  al.  [3] 

Domain:  Mass  spectrometry. 

Purpose;  Learn  to  predict  data  points  in  the  mass  spectra  of 
molecules . 

Environment:  Set  of  all  known  molecule/data-point  pairs. 

Performance  Element:  Predicts  peaks  (data  points)  in  mass-spectra  of  , 

molecules  using  learned  production  rules.  Employs  a model  of 
mass  spectrometry  for  translating  between  mass-spectral 
processes  (predicted  by  the  rules)  and  data  points  in  the 
spectrum. 

Instance  Selector:  Accepts  a set  of  known  molecule/spectrum  pairs 

from  the  user. 

Critic:  Evaluation  - determines  the  suitability  of  the  set  of 
predictions  generated  by  a rule.  Diagnosis  - states  whether  the 

rule  is  acceptable,  too  specific,  or  too  general.  Therapy  - i 

recommends  adding  or  deleting  features  to  the  left-hand  sides  of  i 

rules. 

Learning  Element:  Conducts  a heuristic  search  through  the  space  of  ^ 

plausible  rules  using  a predefined  rule  generator.  At  each  step  ^ 

in  the  search  the  potential  rule's  performance  is  reviewed  by  | 


the  critic. 
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World  Model:  Representation  of  molecules  as  graphs,  production  rule 
model  of  mass  spectrometry,  vocabulary  of  rules  used  to 
represent  learned  information;  heuristics  used  by  the  critic  in 
directing  the  rule  search. 


Program  to  Learn  Structural  Descriptions  from  Examples,  Winston  [37] 

Domain:  Blocks  world.  j 

Purpose:  Learn  to  identify  blocks  world  structures  (such  as  arches  ' 

and  towers). 

Environment:  Set  of  possible  line  drawing/structure-classification 
pairs . 

Performance  Element:  Decides  class  of  structures  to  which  the  input 
structure  belongs.  Uses  a model  of  the  structure  class  supplied 

by  the  learning  element.  ' 

Instance  Selector:  Accepts  training  instances  supplied  individually 
by  the  user. 

Critic:  Evaluation  - compares  the  classification  made  by  the 

Performance  Element  against  the  correct  classification  as  , 

supplied  with  each  training  instance.  Diagnosis  - generates  a 

comparison  description  pointing  out  differences  between  the  j 

model  and  the  structure  description. 

Learning  Element:  Constructs  a model  of  the  class  of  structures 
under  consideration.  Examines  the  comparison  description 

supplied  by  the  critic,  and  modifies  the  model  to  strengthen  or 

weaken  the  correspondence  between  the  model  and  the  training  j 

instance . 

World  Model:  Representation  of  scenes  as  line  drawings,  method  of 

translating  line  drawings  to  graphical  descriptions,  grammar  for 
representing  the  learned  information,  domain-specific  heuristics 
for  resolving  among  possible  changes  to  each  structure  class 
model . 
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Checker  Player,  Samuel  [26]  [27] 

Domain:  Game  of  checkers. 

Purpose:  Learn  to  play  good  game  of  checkers  (here  we  discuss  only 
the  version  of  the  program  which  learns  a linear  polynomial 
evaluation  function  by  examination  of  moves  suggested  by  experts 
("book  moves"). 

Environment:  Set  of  all  legal  game  boards. 

LSI  (lowest  layer): 

Purpose:  Learn  a good  set  of  coefficients  for  combining  board 
features  in  a linear  polynomial  evaluation  function. 

Performance  Element:  Uses  the  learned  evaluation  function  to  rank 

plausible  moves  for  a given  board  position. 

Instance  Selector:  Reads  instances  from  a list  of  pre-deflned  game- 
board/recommended-move  pairs. 

Critic:  Evaluation  - examines  the  ranking  given  to  the  book  move  by 
the  performance  element.  Diagnosis  - suggests  that  the  book  move 
should  be  ranked  above  all  other  moves. 

Learning  Element;  Adjusts  weights  of  linear  polynomial  to  make  move 
selection  correspond  to  the  critic's  recommendation. 

World  Model;  Syntax  of  game  board,  form  and  features  of  linear 
polynomial  evaluation  function,  method  for  adjusting  evaluation 
function,  and  rules  of  checkers. 

LS2: 

Purpose:  Improve  the  performance  of  LSI  by  selection  of  a good  set 
of  board  features. 

Performance  Element:  LSI. 

Instance  Selector;  The  entire  set  of  possible  training  instances  is 
simply  passed  to  LSI  (via  the  blackboard). 

Critic:  Evaluation  - analyses  the  learning  ability  of  LSI  (i.e.,  the 
LS2  performance  element)  with  the  current  set  of  evaluation 
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function  features.  Diagnosis  - singles  out  features  which  are 
not  useful.  Therapy  - selects  new  features  from  a predefined 
list  to  replace  useless  features. 

Learning  Element;  Redefines  the  current  set  of  features  as 
recommended  by  the  critic. 

World  Model:  The  LSI  world  model,  plus  the  set  of  features  which  may 
be  considered,  and  the  performance  standard  employed  by  the  LS2 
critic. 


Poker  Player,  Waterman  [35] 

Domain:  Draw  poker. 

Purpose;  Learn  a good  strategy  for  making  bets  in  draw  poker. 

Environment:  Set  of  all  legal  poker  game  states. 

Performance  Element;  Applies  the  learned  production  rules  to 
generate  actions  in  a poker  game,  e.g.,  bets. 

Instance  Selector:  Selects  each  game  state  derived  by  play  against 
an  opponent  as  a training  instance. 

Critic:  Two  versions  of  the  program  use  two  different  critics.  In 
both  cases  the  critic  performs  the  following  functions: 
Evaluation  - decides  whether  the  poker  bet  made  by  the 

Performance  Element  was  acceptable.  Diagnosis  - gives  important 
state  variables  for  deciding  the  correct  bet.  Therapy  - provides 

the  bet  which  the  Performance  Element  should  have  made.  In 

"explicit"  learning  the  critic  is  an  expert  poker  player  , 
either  human  or  programmed.  In  "implicit"  learning,  the 

evaluation  and  therapy  are  deduced  from  the  next  action  of  the 
opponent  and  a set  of  predefined  axioms,  while  diagnosis  is  read 
from  a predefined  "decision  matrix". 

Learning  Element;  Modifies  and  adds  production  rules  to  the  system. 
Mistakes  are  corrected  by  adding  a new  rule  in  front  of  the  rule 
responsible  for  the  incorrect  response. 
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World  Model:  Rules  of  poker,  features  used  to  describe  the  game 
state,  the  language  of  production  rules,  heuristics  for  updating 
the  rule  base,  the  model  of  an  opponent. 


Model  Reference  Adaptive  Control,  e.g.  Landau  [17] 

Domain:  Control  Systems. 

Purpose:  Construct  a "controller"  which  preprocesses  inputs  to  an 
existing  system  (called  the  "plant").  The  behavior  of  the 
combined  controller-plant  system  is  to  mimic  the  behavior  of  a 
third  system  (called  the  "reference  model")  on  the  training 
data . 

Environment:  The  plant  to  be  controlled,  and  the  set  of  possible 
inputs  (including  disturbances). 

Performance  Element:  The  controller  - a system  whose  output  is  used 
as  input  to  the  plant.  Its  behavior  is  a function  of  the  input 
signal,  past  I/O  behavior  of  the  plant,  and  a set  of  adjustable 
parameters . 

Instance  Selector:  Accepts  data  sequence  (as  input  to  the 

controller)  from  the  environment. 

Critic:  Evaluation  - applies  a measure  of  performance  which  is  some 
function  of  the  arithmetic  difference  between  the  plant  and 
reference  model  outputs.  In  some  cases  the  reference  model  is 
mathematically  defined,  and  can  therefore  be  considered  part  of 
the  critic.  In  other  cases  the  reference  model  is  an  actual 
system,  and  is  considered  part  of  the  environment. 

Lt.arning  Element:  Modifies  the  parameters  of  the  performance  element 
(controller),  depending  on  the  performance  measure  supplied  by 
the  critic. 

World  Model:  Control  theory  assumptions  (time  invariance,  linearity, 
etc.)  and  technique.”,  and  the  standard  of  performance  embodied 
in  the  critic. 
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